{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "name": "18 - Export and run models with ONNX",
      "provenance": [],
      "collapsed_sections": []
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "accelerator": "GPU"
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "4Pjmz-RORV8E"
      },
      "source": [
        "# Export and run models with ONNX\n",
        "\n",
        "The [ONNX runtime](https://onnx.ai/) provides a common serialization format for machine learning models. ONNX supports a number of [different platforms/languages](https://onnxruntime.ai/docs/how-to/install.html#requirements) and has features built in to help reduce inference time. \n",
        "\n",
        "PyTorch has robust support for exporting Torch models to ONNX. This enables exporting Hugging Face Transformer and/or other downstream models directly to ONNX. \n",
        "\n",
        "ONNX opens an avenue for direct inference using a number of languages and platforms. For example, a model could be run directly on Android to limit data sent to a third party service. ONNX is an exciting development with a lot of promise. Microsoft has also released [Hummingbird](https://github.com/microsoft/hummingbird) which enables exporting traditional models (sklearn, decision trees, logistical regression..) to ONNX. \n",
        "\n",
        "This notebook will cover how to export models to ONNX using txtai. These models will then be directly run in Python, JavaScript, Java and Rust. Currently, txtai supports all these languages through it's API and that is still the recommended approach. "
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Dk31rbYjSTYm"
      },
      "source": [
        "# Install dependencies\n",
        "\n",
        "Install `txtai` and all dependencies. Since this notebook uses ONNX quantization, we need to install the pipeline extras package."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "XMQuuun2R06J"
      },
      "source": [
        "%%capture\n",
        "!pip install datasets git+https://github.com/neuml/txtai#egg=txtai[pipeline]"
      ],
      "execution_count": 31,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "PNPJ95cdTKSS"
      },
      "source": [
        "# Run a model with ONNX\n",
        "\n",
        "Let's get right to it! The following example exports a sentiment analysis model to ONNX and runs an inference session.\n",
        "\n"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "USb4JXZHxqTA",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "b32db17b-2563-4960-c1bc-fc1193172ea4"
      },
      "source": [
        "import numpy as np\n",
        "\n",
        "from onnxruntime import InferenceSession, SessionOptions\n",
        "from transformers import AutoTokenizer\n",
        "from txtai.pipeline import HFOnnx\n",
        "\n",
        "# Normalize logits using sigmoid function\n",
        "sigmoid = lambda x: 1.0 / (1.0 + np.exp(-x))\n",
        "\n",
        "# Export to ONNX\n",
        "onnx = HFOnnx()\n",
        "model = onnx(\"distilbert-base-uncased-finetuned-sst-2-english\", \"text-classification\")\n",
        "\n",
        "# Start inference session\n",
        "options = SessionOptions()\n",
        "session = InferenceSession(model, options)\n",
        "\n",
        "# Tokenize\n",
        "tokenizer = AutoTokenizer.from_pretrained(\"distilbert-base-uncased-finetuned-sst-2-english\")\n",
        "tokens = tokenizer([\"I am happy\", \"I am mad\"], return_tensors=\"np\")\n",
        "\n",
        "# Print results\n",
        "outputs = session.run(None, dict(tokens))\n",
        "print(sigmoid(outputs[0]))"
      ],
      "execution_count": 32,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "[[0.01295124 0.9909526 ]\n",
            " [0.9874723  0.0297817 ]]\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "jkmQoQvlmHfQ"
      },
      "source": [
        "And just like that, there are results! The text classification model is judging sentiment using two labels, 0 for negative to 1 for positive. The results above shows the probability of each label per text snippet.\n",
        "\n",
        "The ONNX pipeline loads the model, converts the graph to ONNX and returns. Note that no output file was provided, in this case the ONNX model is returned as a byte array. If an output file is provided, this method returns the output path."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "yFAOHVmXml8o"
      },
      "source": [
        "# Train and Export a model for Text Classification\n",
        "\n",
        "Next we'll combine the ONNX pipeline with a Trainer pipeline to create a \"train and export to ONNX\" workflow."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 398
        },
        "id": "Wh8TkszumlIe",
        "outputId": "f1fd81e5-5e4b-4a46-9ed7-33aacd20b862"
      },
      "source": [
        "from datasets import load_dataset\n",
        "from txtai.pipeline import HFTrainer\n",
        "\n",
        "trainer = HFTrainer()\n",
        "\n",
        "# Hugging Face dataset\n",
        "ds = load_dataset(\"glue\", \"sst2\")\n",
        "data = ds[\"train\"].select(range(5000)).flatten_indices()\n",
        "\n",
        "# Train new model using 5,000 SST2 records (in-memory)\n",
        "model, tokenizer = trainer(\"google/electra-base-discriminator\", data, columns=(\"sentence\", \"label\"))\n",
        "\n",
        "# Export model trained in-memory to ONNX (still in-memory)\n",
        "output = onnx((model, tokenizer), \"text-classification\", quantize=True)\n",
        "\n",
        "# Start inference session\n",
        "options = SessionOptions()\n",
        "session = InferenceSession(output, options)\n",
        "\n",
        "# Tokenize\n",
        "tokens = tokenizer([\"I am happy\", \"I am mad\"], return_tensors=\"np\")\n",
        "\n",
        "# Print results\n",
        "outputs = session.run(None, dict(tokens))\n",
        "print(sigmoid(outputs[0]))"
      ],
      "execution_count": 33,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stderr",
          "text": [
            "Reusing dataset glue (/root/.cache/huggingface/datasets/glue/sst2/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad)\n",
            "Loading cached processed dataset at /root/.cache/huggingface/datasets/glue/sst2/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad/cache-0ad799e068ae6c27.arrow\n",
            "Loading cached processed dataset at /root/.cache/huggingface/datasets/glue/sst2/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad/cache-422ba3363aae2656.arrow\n",
            "Some weights of the model checkpoint at google/electra-base-discriminator were not used when initializing ElectraForSequenceClassification: ['discriminator_predictions.dense.bias', 'discriminator_predictions.dense_prediction.bias', 'discriminator_predictions.dense_prediction.weight', 'discriminator_predictions.dense.weight']\n",
            "- This IS expected if you are initializing ElectraForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).\n",
            "- This IS NOT expected if you are initializing ElectraForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).\n",
            "Some weights of ElectraForSequenceClassification were not initialized from the model checkpoint at google/electra-base-discriminator and are newly initialized: ['classifier.out_proj.weight', 'classifier.dense.weight', 'classifier.dense.bias', 'classifier.out_proj.bias']\n",
            "You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n"
          ]
        },
        {
          "output_type": "display_data",
          "data": {
            "text/html": [
              "\n",
              "    <div>\n",
              "      \n",
              "      <progress value='1875' max='1875' style='width:300px; height:20px; vertical-align: middle;'></progress>\n",
              "      [1875/1875 05:47, Epoch 3/3]\n",
              "    </div>\n",
              "    <table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: left;\">\n",
              "      <th>Step</th>\n",
              "      <th>Training Loss</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <td>500</td>\n",
              "      <td>0.409200</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <td>1000</td>\n",
              "      <td>0.269900</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <td>1500</td>\n",
              "      <td>0.153200</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table><p>"
            ],
            "text/plain": [
              "<IPython.core.display.HTML object>"
            ]
          },
          "metadata": {}
        },
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "[[0.02424305 0.9557785 ]\n",
            " [0.95884305 0.05541185]]\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "lE7dPj3tsn5S"
      },
      "source": [
        "The results are similar to the previous step, although this model is only trained on a fraction of the sst2 dataset. Lets save this model for later."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "Q_kAFYd_s_Bi"
      },
      "source": [
        "text = onnx((model, tokenizer), \"text-classification\", \"text-classify.onnx\", quantize=True)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ugNZO4c-uAS-"
      },
      "source": [
        "# Export a Sentence Embeddings model\n",
        "\n",
        "The ONNX pipeline also supports exporting sentence embeddings models trained with the [sentence-transformers](https://github.com/UKPLab/sentence-transformers) package. "
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "x9B7qOk_uQRN"
      },
      "source": [
        "embeddings = onnx(\"sentence-transformers/paraphrase-MiniLM-L6-v2\", \"pooling\", \"embeddings.onnx\", quantize=True)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "rirMSM2kvgJF"
      },
      "source": [
        "Now let's run the model with ONNX."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "6MBraENcu8Oz",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "96f5e4ad-bb7d-4afe-fb7e-0ae2c76faed5"
      },
      "source": [
        "from sklearn.metrics.pairwise import cosine_similarity\n",
        "\n",
        "options = SessionOptions()\n",
        "session = InferenceSession(embeddings, options)\n",
        "\n",
        "tokens = tokenizer([\"I am happy\", \"I am glad\"], return_tensors=\"np\")\n",
        "\n",
        "outputs = session.run(None, dict(tokens))[0]\n",
        "\n",
        "print(cosine_similarity(outputs))"
      ],
      "execution_count": 36,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "[[1.0000002 0.8430618]\n",
            " [0.8430618 1.       ]]\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "pwgU4vu8vk0T"
      },
      "source": [
        "The code above tokenizes two separate text snippets (\"I am happy\" and \"I am glad\") and runs it through the ONNX model. \n",
        "\n",
        "This outputs two embeddings arrays and those arrays are compared using cosine similarity. As we can see, the two text snippets have close semantic meaning."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "t_OQaQeIb7UB"
      },
      "source": [
        "# Load an ONNX model with txtai\n",
        "\n",
        "txtai has built-in support for ONNX models. Loading an ONNX model is seamless and Embeddings and Pipelines support it. The following section shows how to load a classification pipeline and embeddings model backed by ONNX."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "vhsFzCRBby-h",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "54d336ed-e9d6-4f65-969e-1f318cde4757"
      },
      "source": [
        "from txtai.embeddings import Embeddings\n",
        "from txtai.pipeline import Labels\n",
        "\n",
        "labels = Labels((\"text-classify.onnx\", \"google/electra-base-discriminator\"), dynamic=False)\n",
        "print(labels([\"I am happy\", \"I am mad\"]))\n",
        "\n",
        "embeddings = Embeddings({\"path\": \"embeddings.onnx\", \"tokenizer\": \"sentence-transformers/paraphrase-MiniLM-L6-v2\"})\n",
        "print(embeddings.similarity(\"I am happy\", [\"I am glad\"]))"
      ],
      "execution_count": 37,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "[[(1, 0.9988517761230469), (0, 0.0011482156114652753)], [(0, 0.997488260269165), (1, 0.0025116782635450363)]]\n",
            "[(0, 0.8581848740577698)]\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Xx8G29hkwdNY"
      },
      "source": [
        "# JavaScript\n",
        "\n",
        "So far, we've exported models to ONNX and run them through Python. This already has a lot of advantages, which include fast inference times, quantization and less software dependencies. But ONNX really shines when we run a model trained in Python in other languages/platforms.\n",
        "\n",
        "Let's try running the models trained above in JavaScript. First step is getting the Node.js environment and dependencies setup.\n"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "_RK79O9c4Z_y"
      },
      "source": [
        "%%capture\n",
        "import os\n",
        "\n",
        "!mkdir js\n",
        "os.chdir(\"/content/js\")"
      ],
      "execution_count": 38,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "0HtVEl74xrZ7",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "b205fb46-b5cd-4461-e81a-d4504767c793"
      },
      "source": [
        "%%writefile package.json\n",
        "{\n",
        "  \"name\": \"onnx-test\",\n",
        "  \"private\": true,\n",
        "  \"version\": \"1.0.0\",\n",
        "  \"description\": \"ONNX Runtime Node.js test\",\n",
        "  \"main\": \"index.js\",\n",
        "  \"dependencies\": {\n",
        "    \"onnxruntime-node\": \">=1.8.0\",\n",
        "    \"tokenizers\": \"file:tokenizers/bindings/node\"\n",
        "  }\n",
        "}"
      ],
      "execution_count": 39,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Overwriting package.json\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "NCfV3GL0wt65"
      },
      "source": [
        "%%capture\n",
        "# Copy ONNX models\n",
        "!cp ../text-classify.onnx .\n",
        "!cp ../embeddings.onnx .\n",
        "\n",
        "# Save copy of Bert Tokenizer\n",
        "tokenizer.save_pretrained(\"bert\")\n",
        "\n",
        "# Get tokenizers project\n",
        "!git clone https://github.com/huggingface/tokenizers.git\n",
        "\n",
        "os.chdir(\"/content/js/tokenizers/bindings/node\")\n",
        "\n",
        "# Install Rust\n",
        "!apt-get install rustc\n",
        "\n",
        "# Build tokenizers project locally as version on NPM isn't working properly for latest version of Node.js\n",
        "!npm install --also=dev\n",
        "!npm run dev\n",
        "\n",
        "# Install all dependencies\n",
        "os.chdir(\"/content/js\")\n",
        "!npm install"
      ],
      "execution_count": 40,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "At85iA8U63iV"
      },
      "source": [
        "Next we'll write the inference code in JavaScript to an index.js file."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "RImohEnFyFg0",
        "cellView": "form",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "7409644e-5c18-4596-9197-bc5ff6491a2a"
      },
      "source": [
        "#@title\n",
        "%%writefile index.js\n",
        "const ort = require('onnxruntime-node');\n",
        "const { promisify } = require('util');\n",
        "const { Tokenizer } = require(\"tokenizers/dist/bindings/tokenizer\");\n",
        "\n",
        "function sigmoid(data) {\n",
        "    return data.map(x => 1 / (1 + Math.exp(-x)))\n",
        "}\n",
        "\n",
        "function softmax(data) { \n",
        "    return data.map(x => Math.exp(x) / (data.map(y => Math.exp(y))).reduce((a,b) => a+b)) \n",
        "}\n",
        "\n",
        "function similarity(v1, v2) {\n",
        "    let dot = 0.0;\n",
        "    let norm1 = 0.0;\n",
        "    let norm2 = 0.0;\n",
        "\n",
        "    for (let x = 0; x < v1.length; x++) {\n",
        "        dot += v1[x] * v2[x];\n",
        "        norm1 += Math.pow(v1[x], 2);\n",
        "        norm2 += Math.pow(v2[x], 2);\n",
        "    }\n",
        "\n",
        "    return dot / (Math.sqrt(norm1) * Math.sqrt(norm2));\n",
        "}\n",
        "\n",
        "function tokenizer(path) {\n",
        "    let tokenizer = Tokenizer.fromFile(path);\n",
        "    return promisify(tokenizer.encode.bind(tokenizer));\n",
        "}\n",
        "\n",
        "async function predict(session, text) {\n",
        "    try {\n",
        "        // Tokenize input\n",
        "        let encode = tokenizer(\"bert/tokenizer.json\");\n",
        "        let output = await encode(text);\n",
        "\n",
        "        let ids = output.getIds().map(x => BigInt(x))\n",
        "        let mask = output.getAttentionMask().map(x => BigInt(x))\n",
        "        let tids = output.getTypeIds().map(x => BigInt(x))\n",
        "\n",
        "        // Convert inputs to tensors    \n",
        "        let tensorIds = new ort.Tensor('int64', BigInt64Array.from(ids), [1, ids.length]);\n",
        "        let tensorMask = new ort.Tensor('int64', BigInt64Array.from(mask), [1, mask.length]);\n",
        "        let tensorTids = new ort.Tensor('int64', BigInt64Array.from(tids), [1, tids.length]);\n",
        "\n",
        "        let inputs = null;\n",
        "        if (session.inputNames.length > 2) {\n",
        "            inputs = { input_ids: tensorIds, attention_mask: tensorMask, token_type_ids: tensorTids};\n",
        "        }\n",
        "        else {\n",
        "            inputs = { input_ids: tensorIds, attention_mask: tensorMask};\n",
        "        }\n",
        "\n",
        "        return await session.run(inputs);\n",
        "    } catch (e) {\n",
        "        console.error(`failed to inference ONNX model: ${e}.`);\n",
        "    }\n",
        "}\n",
        "\n",
        "async function main() {\n",
        "    let args = process.argv.slice(2);\n",
        "    if (args.length > 1) {\n",
        "        // Run sentence embeddings\n",
        "        const session = await ort.InferenceSession.create('./embeddings.onnx');\n",
        "\n",
        "        let v1 = await predict(session, args[0]);\n",
        "        let v2 = await predict(session, args[1]);\n",
        "\n",
        "        // Unpack results\n",
        "        v1 = v1.embeddings.data;\n",
        "        v2 = v2.embeddings.data;\n",
        "\n",
        "        // Print similarity\n",
        "        console.log(similarity(Array.from(v1), Array.from(v2)));\n",
        "    }\n",
        "    else {\n",
        "        // Run text classifier\n",
        "        const session = await ort.InferenceSession.create('./text-classify.onnx');\n",
        "        let results = await predict(session, args[0]);\n",
        "\n",
        "        // Normalize results using softmax and print\n",
        "        console.log(softmax(results.logits.data));\n",
        "    }\n",
        "}\n",
        "\n",
        "main();"
      ],
      "execution_count": 41,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Overwriting index.js\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "rZI9PJzi6_bO"
      },
      "source": [
        "## Run Text Classification in JavaScript with ONNX"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "bdz68KZT1Jfm",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "3b388d33-56c2-4e4e-a872-f484d9e1b847"
      },
      "source": [
        "!node . \"I am happy\"\n",
        "!node . \"I am mad\""
      ],
      "execution_count": 42,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Float32Array(2) [ \u001b[33m0.001104647060856223\u001b[39m, \u001b[33m0.9988954067230225\u001b[39m ]\n",
            "Float32Array(2) [ \u001b[33m0.9976443648338318\u001b[39m, \u001b[33m0.00235558208078146\u001b[39m ]\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "swSEmqto33VP"
      },
      "source": [
        "First off, have to say this is 🔥🔥🔥! Just amazing that this model can be fully run in JavaScript. It's a great time to be in NLP!\n",
        "\n",
        "The steps above installed a JavaScript environment with dependencies to run ONNX and tokenize data in JavaScript. The text classification model previously created is loaded into the JavaScript ONNX runtime and inference is run.\n",
        "\n",
        "As a reminder, the text classification model is judging sentiment using two labels, 0 for negative to 1 for positive. The results above shows the probability of each label per text snippet."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "5Az9YaDc6u9P"
      },
      "source": [
        "## Build sentence embeddings and compare similarity in JavaScript with ONNX"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "10jcUbUx6MAI",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "aa4ca007-159e-4206-fcc4-af0530959f37"
      },
      "source": [
        "!node . \"I am happy\", \"I am glad\""
      ],
      "execution_count": 43,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "\u001b[33m0.8414919420066624\u001b[39m\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "8Jyk-9Ko78Ma"
      },
      "source": [
        "Once again....wow!! The sentence embeddings model produces vectors that can be used to compare semantic similarity, -1 being most dissimilar and 1 being most similar.\n",
        "\n",
        "While the results don't match the exported model exactly, it's very close. Worth mentioning again that this is 100% JavaScript, no API or remote calls, all within node."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "BQeMBNWO9Hpr"
      },
      "source": [
        "# Java\n",
        "\n",
        "Let's try the same thing with Java. The following sections initialize a Java build environment and writes out the code necessary to run the ONNX inference."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "9wlVWVky9NZ3"
      },
      "source": [
        "%%capture\n",
        "import os\n",
        "\n",
        "os.chdir(\"/content\")\n",
        "!mkdir java\n",
        "os.chdir(\"/content/java\")\n",
        "\n",
        "# Copy ONNX models\n",
        "!cp ../text-classify.onnx .\n",
        "!cp ../embeddings.onnx .\n",
        "\n",
        "# Save copy of Bert Tokenizer\n",
        "tokenizer.save_pretrained(\"bert\")\n",
        "\n",
        "!mkdir -p src/main/java\n",
        "\n",
        "# Install gradle\n",
        "!wget https://services.gradle.org/distributions/gradle-7.2-bin.zip\n",
        "!unzip -o gradle-7.2-bin.zip\n",
        "!gradle-7.2/bin/gradle wrapper"
      ],
      "execution_count": 44,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "gjZ2p7Jf9mOV",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "a5e6483e-322a-42e1-eba8-2cee9c94a39b"
      },
      "source": [
        "%%writefile build.gradle\n",
        "apply plugin: \"java\"\n",
        "\n",
        "repositories {\n",
        "    mavenCentral()\n",
        "}\n",
        "\n",
        "dependencies {\n",
        "    implementation \"com.robrua.nlp:easy-bert:1.0.3\"\n",
        "    implementation \"com.microsoft.onnxruntime:onnxruntime:1.8.1\"\n",
        "}\n",
        "\n",
        "java {\n",
        "    toolchain {\n",
        "        languageVersion = JavaLanguageVersion.of(8)\n",
        "    }\n",
        "}\n",
        "\n",
        "jar {\n",
        "    archiveBaseName = \"onnxjava\"\n",
        "}\n",
        "\n",
        "task onnx(type: JavaExec) {\n",
        "    description = \"Runs ONNX demo\"\n",
        "    classpath = sourceSets.main.runtimeClasspath\n",
        "    main = \"OnnxDemo\"\n",
        "}"
      ],
      "execution_count": 45,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Overwriting build.gradle\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "cellView": "form",
        "id": "vnxKGSuz_fnj",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "ae79fd32-c6f3-49ec-fcd3-7d09a5d2997a"
      },
      "source": [
        "#@title\n",
        "%%writefile src/main/java/OnnxDemo.java\n",
        "import java.io.File;\n",
        "\n",
        "import java.nio.LongBuffer;\n",
        "\n",
        "import java.util.Arrays;\n",
        "import java.util.ArrayList;\n",
        "import java.util.HashMap;\n",
        "import java.util.List;\n",
        "import java.util.Map;\n",
        "\n",
        "import ai.onnxruntime.OnnxTensor;\n",
        "import ai.onnxruntime.OrtEnvironment;\n",
        "import ai.onnxruntime.OrtSession;\n",
        "import ai.onnxruntime.OrtSession.Result;\n",
        "\n",
        "import com.robrua.nlp.bert.FullTokenizer;\n",
        "\n",
        "class Tokens {\n",
        "    public long[] ids;\n",
        "    public long[] mask;\n",
        "    public long[] types;\n",
        "}\n",
        "\n",
        "class Tokenizer {\n",
        "    private FullTokenizer tokenizer;\n",
        "\n",
        "    public Tokenizer(String path) {\n",
        "        File vocab = new File(path);\n",
        "        this.tokenizer = new FullTokenizer(vocab, true);\n",
        "    }\n",
        "\n",
        "    public Tokens tokenize(String text) {\n",
        "        // Build list of tokens\n",
        "        List<String> tokensList = new ArrayList();\n",
        "        tokensList.add(\"[CLS]\"); \n",
        "        tokensList.addAll(Arrays.asList(tokenizer.tokenize(text)));\n",
        "        tokensList.add(\"[SEP]\");\n",
        "\n",
        "        int[] ids = tokenizer.convert(tokensList.toArray(new String[0]));\n",
        "\n",
        "        Tokens tokens = new Tokens();\n",
        "\n",
        "        // input ids    \n",
        "        tokens.ids = Arrays.stream(ids).mapToLong(i -> i).toArray();\n",
        "\n",
        "        // attention mask\n",
        "        tokens.mask = new long[ids.length];\n",
        "        Arrays.fill(tokens.mask, 1);\n",
        "\n",
        "        // token type ids\n",
        "        tokens.types = new long[ids.length];\n",
        "        Arrays.fill(tokens.types, 0);\n",
        "\n",
        "        return tokens;\n",
        "    }\n",
        "}\n",
        "\n",
        "class Inference {\n",
        "    private Tokenizer tokenizer;\n",
        "    private OrtEnvironment env;\n",
        "    private OrtSession session;\n",
        "\n",
        "    public Inference(String model) throws Exception {\n",
        "        this.tokenizer = new Tokenizer(\"bert/vocab.txt\");\n",
        "        this.env = OrtEnvironment.getEnvironment();\n",
        "        this.session = env.createSession(model, new OrtSession.SessionOptions());\n",
        "    }\n",
        "\n",
        "    public float[][] predict(String text) throws Exception {\n",
        "        Tokens tokens = this.tokenizer.tokenize(text);\n",
        "\n",
        "        Map<String, OnnxTensor> inputs = new HashMap<String, OnnxTensor>();\n",
        "        inputs.put(\"input_ids\", OnnxTensor.createTensor(env, LongBuffer.wrap(tokens.ids),  new long[]{1, tokens.ids.length}));\n",
        "        inputs.put(\"attention_mask\", OnnxTensor.createTensor(env, LongBuffer.wrap(tokens.mask),  new long[]{1, tokens.mask.length}));\n",
        "        inputs.put(\"token_type_ids\", OnnxTensor.createTensor(env, LongBuffer.wrap(tokens.types),  new long[]{1, tokens.types.length}));\n",
        "\n",
        "        return (float[][])session.run(inputs).get(0).getValue();\n",
        "    }\n",
        "}\n",
        "\n",
        "class Vectors {\n",
        "    public static double similarity(float[] v1, float[] v2) {\n",
        "        double dot = 0.0;\n",
        "        double norm1 = 0.0;\n",
        "        double norm2 = 0.0;\n",
        "\n",
        "        for (int x = 0; x < v1.length; x++) {\n",
        "            dot += v1[x] * v2[x];\n",
        "            norm1 += Math.pow(v1[x], 2);\n",
        "            norm2 += Math.pow(v2[x], 2);\n",
        "        }\n",
        "\n",
        "        return dot / (Math.sqrt(norm1) * Math.sqrt(norm2));\n",
        "    }\n",
        "\n",
        "    public static float[] softmax(float[] input) {\n",
        "        double[] t = new double[input.length];\n",
        "        double sum = 0.0;\n",
        "\n",
        "        for (int x = 0; x < input.length; x++) {\n",
        "            double val = Math.exp(input[x]);\n",
        "            sum += val;\n",
        "            t[x] = val;\n",
        "        }\n",
        "\n",
        "        float[] output = new float[input.length];\n",
        "        for (int x = 0; x < output.length; x++) {\n",
        "            output[x] = (float) (t[x] / sum);\n",
        "        }\n",
        "\n",
        "        return output;\n",
        "    }\n",
        "}\n",
        "\n",
        "public class OnnxDemo {\n",
        "    public static void main(String[] args) {\n",
        "        try {\n",
        "            if (args.length < 2) {\n",
        "              Inference inference = new Inference(\"text-classify.onnx\");\n",
        "\n",
        "              float[][] v1 = inference.predict(args[0]);\n",
        "\n",
        "              System.out.println(Arrays.toString(Vectors.softmax(v1[0])));\n",
        "            }\n",
        "            else {\n",
        "              Inference inference = new Inference(\"embeddings.onnx\");\n",
        "              float[][] v1 = inference.predict(args[0]);\n",
        "              float[][] v2 = inference.predict(args[1]);\n",
        "\n",
        "              System.out.println(Vectors.similarity(v1[0], v2[0]));\n",
        "            }\n",
        "        }\n",
        "        catch (Exception ex) {\n",
        "            ex.printStackTrace();\n",
        "        }\n",
        "    }\n",
        "}"
      ],
      "execution_count": 46,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Overwriting src/main/java/OnnxDemo.java\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "qQuuXw97Z_I7"
      },
      "source": [
        "## Run Text Classification in Java with ONNX"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "hFXyH96gAZpu",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "78701d6e-e4e7-4e5c-da2f-a29695030362"
      },
      "source": [
        "!./gradlew -q --console=plain onnx --args='\"I am happy\"' 2> /dev/null\n",
        "!./gradlew -q --console=plain onnx --args='\"I am mad\"' 2> /dev/null"
      ],
      "execution_count": 47,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "[0.0011046471, 0.99889535]\n",
            "\u001b[m[0.9976444, 0.002355582]\n",
            "\u001b[m"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "pE3FSsAAaJHe"
      },
      "source": [
        "The command above tokenizes the input and runs inference with a text classification model previously created using a Java ONNX inference session. \n",
        "\n",
        "As a reminder, the text classification model is judging sentiment using two labels, 0 for negative to 1 for positive. The results above shows the probability of each label per text snippet."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Bux8v0C4aDyP"
      },
      "source": [
        "## Build sentence embeddings and compare similarity in Java with ONNX"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "f6zE9VrwCcUa",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "d6b304f3-151d-4ef0-978a-19e51e2b3fe0"
      },
      "source": [
        "!./gradlew -q --console=plain onnx --args='\"I am happy\" \"I am glad\"' 2> /dev/null"
      ],
      "execution_count": 48,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "0.8581848568615768\n",
            "\u001b[m"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "0uepOZvJDOCB"
      },
      "source": [
        "The sentence embeddings model produces vectors that can be used to compare semantic similarity, -1 being most dissimilar and 1 being most similar. \n",
        "\n",
        "This is 100% Java, no API or remote calls, all within the JVM. Still think it's amazing!"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "faRu9EAJDUXw"
      },
      "source": [
        "# Rust\n",
        "\n",
        "Last but not least, let's try Rust. The following sections initialize a Rust build environment and writes out the code necessary to run the ONNX inference."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "X3Xp1KLhelqw"
      },
      "source": [
        "%%capture\n",
        "import os\n",
        "\n",
        "os.chdir(\"/content\")\n",
        "!mkdir rust\n",
        "os.chdir(\"/content/rust\")\n",
        "\n",
        "# Copy ONNX models\n",
        "!cp ../text-classify.onnx .\n",
        "!cp ../embeddings.onnx .\n",
        "\n",
        "# Save copy of Bert Tokenizer\n",
        "tokenizer.save_pretrained(\"bert\")\n",
        "\n",
        "# Install Rust\n",
        "!apt-get install rustc\n",
        "\n",
        "!mkdir -p src"
      ],
      "execution_count": 49,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "c7hz--Gne6Oa",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "39635f79-2071-403c-f833-9152907c2d1a"
      },
      "source": [
        "%%writefile Cargo.toml\n",
        "[package]\n",
        "name = \"onnx-test\"\n",
        "version = \"1.0.0\"\n",
        "description = \"\"\"\n",
        "ONNX Runtime Rust test\n",
        "\"\"\"\n",
        "edition = \"2018\"\n",
        "\n",
        "[dependencies]\n",
        "onnxruntime = { version = \"0.0.14\"}\n",
        "tokenizers = { version = \"0.10.1\"}"
      ],
      "execution_count": 50,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Overwriting Cargo.toml\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "cellView": "form",
        "id": "_8fdRvO1fFBm",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "79407b9d-8934-4cf2-c3cf-21e0a68954ca"
      },
      "source": [
        "#@title\n",
        "%%writefile src/main.rs\n",
        "use onnxruntime::environment::Environment;\n",
        "use onnxruntime::GraphOptimizationLevel;\n",
        "use onnxruntime::ndarray::{Array2, Axis};\n",
        "use onnxruntime::tensor::OrtOwnedTensor;\n",
        "\n",
        "use std::env;\n",
        "\n",
        "use tokenizers::decoders::wordpiece::WordPiece as WordPieceDecoder;\n",
        "use tokenizers::models::wordpiece::WordPiece;\n",
        "use tokenizers::normalizers::bert::BertNormalizer;\n",
        "use tokenizers::pre_tokenizers::bert::BertPreTokenizer;\n",
        "use tokenizers::processors::bert::BertProcessing;\n",
        "use tokenizers::tokenizer::{Result, Tokenizer, EncodeInput};\n",
        "\n",
        "fn tokenize(text: String, inputs: usize) -> Vec<Array2<i64>> {\n",
        "    // Load tokenizer\n",
        "    let mut tokenizer = Tokenizer::new(Box::new(\n",
        "        WordPiece::from_files(\"bert/vocab.txt\")\n",
        "            .build()\n",
        "            .expect(\"Vocab file not found\"),\n",
        "    ));\n",
        "\n",
        "    tokenizer.with_normalizer(Box::new(BertNormalizer::default()));\n",
        "    tokenizer.with_pre_tokenizer(Box::new(BertPreTokenizer));\n",
        "    tokenizer.with_decoder(Box::new(WordPieceDecoder::default()));\n",
        "    tokenizer.with_post_processor(Box::new(BertProcessing::new(\n",
        "        (\n",
        "            String::from(\"[SEP]\"),\n",
        "            tokenizer.get_model().token_to_id(\"[SEP]\").unwrap(),\n",
        "        ),\n",
        "        (\n",
        "            String::from(\"[CLS]\"),\n",
        "            tokenizer.get_model().token_to_id(\"[CLS]\").unwrap(),\n",
        "        ),\n",
        "    )));\n",
        "\n",
        "    // Encode input text\n",
        "    let encoding = tokenizer.encode(EncodeInput::Single(text), true).unwrap();\n",
        "\n",
        "    let v1: Vec<i64> = encoding.get_ids().to_vec().into_iter().map(|x| x as i64).collect();\n",
        "    let v2: Vec<i64> = encoding.get_attention_mask().to_vec().into_iter().map(|x| x as i64).collect();\n",
        "    let v3: Vec<i64> = encoding.get_type_ids().to_vec().into_iter().map(|x| x as i64).collect();\n",
        "\n",
        "    let ids = Array2::from_shape_vec((1, v1.len()), v1).unwrap();\n",
        "    let mask = Array2::from_shape_vec((1, v2.len()), v2).unwrap();\n",
        "    let tids = Array2::from_shape_vec((1, v3.len()), v3).unwrap();\n",
        "\n",
        "    return if inputs > 2 { vec![ids, mask, tids] } else { vec![ids, mask] };\n",
        "}\n",
        "\n",
        "fn predict(text: String, softmax: bool) -> Vec<f32> {\n",
        "    // Start onnx session\n",
        "    let environment = Environment::builder()\n",
        "        .with_name(\"test\")\n",
        "        .build().unwrap();\n",
        "\n",
        "    // Derive model path\n",
        "    let model = if softmax { \"text-classify.onnx\" } else { \"embeddings.onnx\" };\n",
        "\n",
        "    let mut session = environment\n",
        "        .new_session_builder().unwrap()\n",
        "        .with_optimization_level(GraphOptimizationLevel::Basic).unwrap()\n",
        "        .with_number_threads(1).unwrap()\n",
        "        .with_model_from_file(model).unwrap();\n",
        "\n",
        "    let inputs = tokenize(text, session.inputs.len());\n",
        "\n",
        "    // Run inference and print result\n",
        "    let outputs: Vec<OrtOwnedTensor<f32, _>> = session.run(inputs).unwrap();\n",
        "    let output: &OrtOwnedTensor<f32, _> = &outputs[0];\n",
        "\n",
        "    let probabilities: Vec<f32>;\n",
        "    if softmax {\n",
        "        probabilities = output\n",
        "            .softmax(Axis(1))\n",
        "            .iter()\n",
        "            .copied()\n",
        "            .collect::<Vec<_>>();\n",
        "    }\n",
        "    else {\n",
        "        probabilities= output\n",
        "            .iter()\n",
        "            .copied()\n",
        "            .collect::<Vec<_>>();\n",
        "    }\n",
        "\n",
        "    return probabilities;\n",
        "}\n",
        "\n",
        "fn similarity(v1: &Vec<f32>, v2: &Vec<f32>) -> f64 {\n",
        "    let mut dot = 0.0;\n",
        "    let mut norm1 = 0.0;\n",
        "    let mut norm2 = 0.0;\n",
        "\n",
        "    for x in 0..v1.len() {\n",
        "        dot += v1[x] * v2[x];\n",
        "        norm1 += v1[x].powf(2.0);\n",
        "        norm2 += v2[x].powf(2.0);\n",
        "    }\n",
        "\n",
        "    return dot as f64 / (norm1.sqrt() * norm2.sqrt()) as f64\n",
        "}\n",
        "\n",
        "fn main() -> Result<()> {\n",
        "    // Tokenize input string\n",
        "    let args: Vec<String> = env::args().collect();\n",
        "\n",
        "    if args.len() <= 2 {\n",
        "      let v1 = predict(args[1].to_string(), true);\n",
        "      println!(\"{:?}\", v1);\n",
        "    }\n",
        "    else {\n",
        "      let v1 = predict(args[1].to_string(), false);\n",
        "      let v2 = predict(args[2].to_string(), false);\n",
        "      println!(\"{:?}\", similarity(&v1, &v2));\n",
        "    }\n",
        "\n",
        "    Ok(())\n",
        "}"
      ],
      "execution_count": 51,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Overwriting src/main.rs\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "OdfQFY-MiA-n"
      },
      "source": [
        "## Run Text Classification in Rust with ONNX"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "b0ymX4ftgWcT",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "3446c3e7-a5d4-4572-ad9e-f463a1d4a7d7"
      },
      "source": [
        "!cargo run \"I am happy\" 2> /dev/null\n",
        "!cargo run \"I am mad\" 2> /dev/null"
      ],
      "execution_count": 52,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "[0.0011003953, 0.99889964]\n",
            "[0.9976444, 0.0023555849]\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "NKccz6bBiIgW"
      },
      "source": [
        "The command above tokenizes the input and runs inference with a text classification model previously created using a Rust ONNX inference session. \n",
        "\n",
        "As a reminder, the text classification model is judging sentiment using two labels, 0 for negative to 1 for positive. The results above shows the probability of each label per text snippet."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "1D1kN0yNiEg7"
      },
      "source": [
        "## Build sentence embeddings and compare similarity in Rust with ONNX"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "A9p6F_ODhenH",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "3dfff62e-d910-433f-c8de-c0b2311141a6"
      },
      "source": [
        "!cargo run \"I am happy\" \"I am glad\" 2> /dev/null"
      ],
      "execution_count": 53,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "0.8583641740656903\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "TQ7Wvn0OiRr4"
      },
      "source": [
        "The sentence embeddings model produces vectors that can be used to compare semantic similarity, -1 being most dissimilar and 1 being most similar. \n",
        "\n",
        "Once again, this is 100% Rust, no API or remote calls. And yes, still think it's amazing!"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "-_FNKUWtjLsO"
      },
      "source": [
        "# Wrapping up\n",
        "\n",
        "This notebook covered how to export models to ONNX using txtai. These models were then run in Python, JavaScript, Java and Rust. Golang was also evaluated but there doesn't currently appear to be a stable enough ONNX runtime available. \n",
        "\n",
        "This method provides a way to train and run machine learning models using a number of programming languages on a number of platforms.\n",
        "\n",
        "The following is a non-exhaustive list of use cases. \n",
        "\n",
        "*   Build locally executed models for mobile/edge devices\n",
        "*   Run models with Java/JavaScript/Rust development stacks when teams prefer not to add Python to the mix\n",
        "*   Export models to ONNX for Python inference to improve CPU performance and/or reduce number of software dependencies"
      ]
    }
  ]
}
